Notebook to explore clean data


In [2]:
%matplotlib inline
from __future__ import division
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import cPickle as pickle
import gzip


/Users/Home/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')

Wind generation

ERCOT provides hourly data on the aggregate level of wind power across the state from 2007-2015. The data files also contain total ERCOT load in MW and the total installed capacity of wind turbines (MW). With this data we can see how ERCOT demand evolved over the 8 years, how wind generation increased, and the change in how much of the total load was met with wind.


In [2]:
filename = 'ERCOT wind data.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)

In [3]:
ercot = pd.read_csv(fullpath, index_col=0)

In [132]:
ercot.head()


Out[132]:
ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change
2007-01-01 00:00:00 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN
2007-01-01 01:00:00 30133.0 2790.0 922.6 33.068100 3.061760 -151.4 -14.096834
2007-01-01 02:00:00 29941.0 2790.0 849.2 30.437276 2.836245 -73.4 -7.955777
2007-01-01 03:00:00 29949.0 2790.0 1056.3 37.860215 3.526996 207.1 24.387659
2007-01-01 04:00:00 30248.0 2790.0 837.1 30.003584 2.767456 -219.2 -20.751680

Distribution of wind output as % of load for the whole dataset.


In [5]:
sns.distplot(ercot['Wind Output, % of Load'])


Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x1167d9a90>

How does this distribution change over time? Need to set a year column.


In [5]:
#Define year and month for the visualizations below
ercot['month'] = pd.to_datetime(ercot.index).month
ercot['year'] = pd.to_datetime(ercot.index).year

The figures below shows a clear shift in the amount of generation from wind over time. In 2007 it was rarely above 10%, and the lowest bin is ~40% of the total. The distribution flattens considerably by 2010, and by 2013 starts creeping above 30% of total load.


In [7]:
sns.violinplot(x='year', y='Wind Output, % of Load', data=ercot, cut=0, linewidth=1.25)

# path = os.path.join('..', 'Midterm Report', 'Wind violin plot.svg')
# plt.savefig(path)


Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x1167cc150>

In [7]:
g = sns.FacetGrid(ercot, col='year', col_wrap=3)
g.map(sns.distplot, 'Wind Output, % of Load')


Out[7]:
<seaborn.axisgrid.FacetGrid at 0x115c37690>

The figures below look at total ERCOT load.

This figure shows that average load across an entire year has been increasing over time.


In [54]:
sns.pointplot('year', 'ERCOT Load, MW', data=ercot)


Out[54]:
<matplotlib.axes._subplots.AxesSubplot at 0x1339133d0>

In [41]:
sns.pointplot('month', 'ERCOT Load, MW', data=ercot, hue='year', 
              palette='Blues', scale=0.5)


Out[41]:
<matplotlib.axes._subplots.AxesSubplot at 0x1262d4dd0>

Interesting. There is far less of a monthly pattern to the wind generation.


In [10]:
fig, (ax1, ax2) = plt.subplots(1,2,figsize=(10,4))
sns.pointplot('month', 'ERCOT Load, MW', data=ercot, hue='year', 
              palette='Blues', scale=0.5, ax=ax1)
sns.pointplot('month', 'Total Wind Output, MW', data=ercot, hue='year', 
              palette='Greens', scale=0.5, ax=ax2)
ax2.legend(bbox_to_anchor=(1, 1), loc=2, title='year')

plt.tight_layout()

path = os.path.join('..', 'Midterm Report', 'Monthly ERCOT load and wind2.svg')
plt.savefig(path, bbox_inches='tight')


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x1166e5ad0>
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x119798f50>
Out[10]:
<matplotlib.legend.Legend at 0x119702790>

There's a definite dip in wind output in the late summer/early fall (July-Sept). The dip has been getting shallower over time though. Oddly, there was a huge drop in wind output in March 2015.


In [49]:
sns.pointplot('month', 'Wind Output, % of Installed', data=ercot, hue='year',
             palette='Blues', scale=0.5)
plt.legend(bbox_to_anchor=(1, 1), loc=2, title='year')


Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x130a450d0>
Out[49]:
<matplotlib.legend.Legend at 0x130a38390>

With the exception of 2007, almost all capacity additions happen in the second half of the year.


In [8]:
sns.pointplot('month', 'Total Wind Installed, MW', data=ercot, hue='year',
             palette='Blues', scale=0.5)
plt.legend(bbox_to_anchor=(1, 1), loc=2, title='year')
path = os.path.join('..', 'Midterm Report', 'Monthly ERCOT wind capacity.svg')
plt.savefig(path, bbox_inches='tight')


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x119c256d0>
Out[8]:
<matplotlib.legend.Legend at 0x119bd0f50>

These distribution plots show hourly ERCOT load for each year. It's been slowly creeping up, with less of a peak around 30GW.


In [42]:
g = sns.FacetGrid(ercot, col='year', col_wrap=3)
g.map(sns.distplot,'ERCOT Load, MW')
g.set_xticklabels(rotation=30)


Out[42]:
<seaborn.axisgrid.FacetGrid at 0x12af66d50>
Out[42]:
<seaborn.axisgrid.FacetGrid at 0x12af66d50>

EIA 860 data

Not sure what I'm seeing here yet. Some huge plants, but most of the rest are below 500MW? Most generation is NG, at least when using the largest amount of fuel in each plant as the basis for classification. (Looking at the EIA website, this is probably correct)


In [3]:
filename = 'EIA 860.pkl'
path = '../Clean Data'
fullpath = os.path.join(path, filename)

eia860 = pickle.load(open(fullpath, 'rb'))

In [16]:
eia860.keys()


Out[16]:
[2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]

In [4]:
all860 = pd.concat(eia860)

In [5]:
filename = 'All EIA 860.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)
all860.to_csv(fullpath)

In [4]:
all860.head()


Out[4]:
PLANT_ID NAMEPLATE_CAPACITY(MW)
2007 0 2 45.0
1 3 2671.4
2 4 225.0
3 7 138.0
4 8 1416.7

In [5]:
all860['year'] = all860.index.get_level_values(0)

In [6]:
all860.reset_index(drop=True, inplace=True)

In [33]:
sns.boxplot(x='year', y='NAMEPLATE_CAPACITY(MW)', data=all860)


Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x11b33bb10>

In [30]:
sns.violinplot(x='year', y='NAMEPLATE_CAPACITY(MW)', data=all860, cut=0)


Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x11610bbd0>

A test of using fuel classifications from 860 to sum capacity by fuel


In [7]:
df = eia860[2007]

In [12]:
df = df.merge(eia923[2007][['PLANT_ID', 'fuel', 'NET_GENERATION_(MEGAWATTHOURS)']], on='PLANT_ID')

In [103]:
df.groupby('fuel').sum()['NAMEPLATE_CAPACITY(MW)'].plot(kind='bar')
plt.ylabel('Nameplate Capacity in 2007 (MW)')


Out[103]:
<matplotlib.axes._subplots.AxesSubplot at 0x11f9d37d0>
Out[103]:
<matplotlib.text.Text at 0x11fa07410>

Of the fuels shown the plot above, we only need to look at:

  • DFO (diesel fuel)
  • LIG (lignite coal)
  • NG (natural gas)
  • PC (petroleum coke)
  • SUB (subbituminous coal)

In [13]:
df = df[df['fuel'].isin(['DFO', 'LIG', 'NG', 'PC', 'SUB'])]

In [105]:
df.head()


Out[105]:
PLANT_ID NAMEPLATE_CAPACITY(MW) fuel NET_GENERATION_(MEGAWATTHOURS)
0 127 720.0 SUB 4.205893e+06
1 298 1849.8 LIG 1.359215e+07
5 3439 187.2 NG 7.943600e+05
6 3452 927.5 NG 5.711710e+05
7 3453 958.3 NG 3.782020e+05

In [14]:
def CF(row):
    possible_gen = row['NAMEPLATE_CAPACITY(MW)'] * 8760
    cf = row['NET_GENERATION_(MEGAWATTHOURS)'] / possible_gen
    return cf

df['Capacity Factor'] = df.apply(CF, axis=1)

In [15]:
df.head()


Out[15]:
PLANT_ID NAMEPLATE_CAPACITY(MW) fuel NET_GENERATION_(MEGAWATTHOURS) Capacity Factor
0 127 720.0 SUB 4.205893e+06 0.666840
1 298 1849.8 LIG 1.359215e+07 0.838802
5 3439 187.2 NG 7.943600e+05 0.484404
6 3452 927.5 NG 5.711710e+05 0.070299
7 3453 958.3 NG 3.782020e+05 0.045052

Apparently one of the plants has something weird going on - tiny nameplate capacity and a CF of 11. And another one has negative net generation?


In [124]:
g = sns.FacetGrid(df, hue='fuel', size=5, aspect=1.5)
g.map(plt.scatter, 'NAMEPLATE_CAPACITY(MW)', 'Capacity Factor')
plt.legend()


Out[124]:
<seaborn.axisgrid.FacetGrid at 0x12280e7d0>
Out[124]:
<matplotlib.legend.Legend at 0x122492110>

The figure below can be used in the midterm report.


In [17]:
g = sns.FacetGrid(df, hue='fuel', size=5, aspect=1.5, ylim=(-0.05,1))
g.map(plt.scatter, 'NAMEPLATE_CAPACITY(MW)', 'Capacity Factor')
plt.title('2007 ERCOT Fossil Fuel Power Plants')
plt.legend(labels=['Subbituminous coal', 'Lignite coal', 'Natural gas', 'Petroleum coke', 'Diesel fuel'])
plt.tight_layout()
path = os.path.join('Midterm figures', 'ERCOT power plants 2007.svg')
plt.savefig(path)


Out[17]:
<seaborn.axisgrid.FacetGrid at 0x115d20d10>
Out[17]:
<matplotlib.text.Text at 0x1188a3550>
Out[17]:
<matplotlib.legend.Legend at 0x11559aa50>

In [ ]:

EIA 923 data


In [6]:
filename = 'EIA 923.pkl'
path = '../Clean Data'
fullpath = os.path.join(path, filename)

eia923 = pickle.load(open(fullpath, 'rb'))

In [7]:
all923 = pd.concat(eia923)

filename = 'All EIA 923.csv'
path = '../Clean Data'
fullpath = os.path.join(path, filename)
all860.to_csv(fullpath)

In [35]:
eia923.keys()


Out[35]:
[2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015]

In [39]:
[(idx, x) for idx, x in enumerate(eia923[2007].columns)]


Out[39]:
[(0, 'PLANT_ID'),
 (1, 'PLANT_NAME'),
 (2, 'YEAR'),
 (3, u'TOTAL_FUEL_CONSUMPTION_QUANTITY'),
 (4, u'ELEC_FUEL_CONSUMPTION_QUANTITY'),
 (5, u'TOTAL_FUEL_CONSUMPTION_MMBTU'),
 (6, u'ELEC_FUEL_CONSUMPTION_MMBTU'),
 (7, u'NET_GENERATION_(MEGAWATTHOURS)'),
 (8, u'MMBTU_PER_UNIT_JAN'),
 (9, u'MMBTU_PER_UNIT_FEB'),
 (10, u'MMBTU_PER_UNIT_MAR'),
 (11, u'MMBTU_PER_UNIT_APR'),
 (12, u'MMBTU_PER_UNIT_MAY'),
 (13, u'MMBTU_PER_UNIT_JUN'),
 (14, u'MMBTU_PER_UNIT_JUL'),
 (15, u'MMBTU_PER_UNIT_AUG'),
 (16, u'MMBTU_PER_UNIT_SEP'),
 (17, u'MMBTU_PER_UNIT_OCT'),
 (18, u'MMBTU_PER_UNIT_NOV'),
 (19, u'MMBTU_PER_UNIT_DEC'),
 (20, u'TOT_MMBTU_JAN'),
 (21, u'TOT_MMBTU_FEB'),
 (22, u'TOT_MMBTU_MAR'),
 (23, u'TOT_MMBTU_APR'),
 (24, u'TOT_MMBTU_MAY'),
 (25, u'TOT_MMBTU_JUN'),
 (26, u'TOT_MMBTU_JUL'),
 (27, u'TOT_MMBTU_AUG'),
 (28, u'TOT_MMBTU_SEP'),
 (29, u'TOT_MMBTU_OCT'),
 (30, u'TOT_MMBTU_NOV'),
 (31, u'TOT_MMBTU_DEC'),
 (32, u'NETGEN_JAN'),
 (33, u'NETGEN_FEB'),
 (34, u'NETGEN_MAR'),
 (35, u'NETGEN_APR'),
 (36, u'NETGEN_MAY'),
 (37, u'NETGEN_JUN'),
 (38, u'NETGEN_JUL'),
 (39, u'NETGEN_AUG'),
 (40, u'NETGEN_SEP'),
 (41, u'NETGEN_OCT'),
 (42, u'NETGEN_NOV'),
 (43, u'NETGEN_DEC'),
 (44, u'ELEC_FUEL_CONSUMPTION_MMBTU %AB'),
 (45, u'ELEC_FUEL_CONSUMPTION_MMBTU %BIT'),
 (46, u'ELEC_FUEL_CONSUMPTION_MMBTU %BLQ'),
 (47, u'ELEC_FUEL_CONSUMPTION_MMBTU %DFO'),
 (48, u'ELEC_FUEL_CONSUMPTION_MMBTU %JF'),
 (49, u'ELEC_FUEL_CONSUMPTION_MMBTU %LFG'),
 (50, u'ELEC_FUEL_CONSUMPTION_MMBTU %LIG'),
 (51, u'ELEC_FUEL_CONSUMPTION_MMBTU %MWH'),
 (52, u'ELEC_FUEL_CONSUMPTION_MMBTU %NG'),
 (53, u'ELEC_FUEL_CONSUMPTION_MMBTU %NUC'),
 (54, u'ELEC_FUEL_CONSUMPTION_MMBTU %OBG'),
 (55, u'ELEC_FUEL_CONSUMPTION_MMBTU %OBL'),
 (56, u'ELEC_FUEL_CONSUMPTION_MMBTU %OBS'),
 (57, u'ELEC_FUEL_CONSUMPTION_MMBTU %OG'),
 (58, u'ELEC_FUEL_CONSUMPTION_MMBTU %OTH'),
 (59, u'ELEC_FUEL_CONSUMPTION_MMBTU %PC'),
 (60, u'ELEC_FUEL_CONSUMPTION_MMBTU %PUR'),
 (61, u'ELEC_FUEL_CONSUMPTION_MMBTU %RFO'),
 (62, u'ELEC_FUEL_CONSUMPTION_MMBTU %SC'),
 (63, u'ELEC_FUEL_CONSUMPTION_MMBTU %SUB'),
 (64, u'ELEC_FUEL_CONSUMPTION_MMBTU %SUN'),
 (65, u'ELEC_FUEL_CONSUMPTION_MMBTU %WAT'),
 (66, u'ELEC_FUEL_CONSUMPTION_MMBTU %WDS'),
 (67, u'ELEC_FUEL_CONSUMPTION_MMBTU %WH'),
 (68, u'ELEC_FUEL_CONSUMPTION_MMBTU %WND'),
 (69, u'ELEC_FUEL_CONSUMPTION_MMBTU %WO'),
 (70, u'ELEC_FUEL_CONSUMPTION_QUANTITY %AB'),
 (71, u'ELEC_FUEL_CONSUMPTION_QUANTITY %BIT'),
 (72, u'ELEC_FUEL_CONSUMPTION_QUANTITY %BLQ'),
 (73, u'ELEC_FUEL_CONSUMPTION_QUANTITY %DFO'),
 (74, u'ELEC_FUEL_CONSUMPTION_QUANTITY %JF'),
 (75, u'ELEC_FUEL_CONSUMPTION_QUANTITY %LFG'),
 (76, u'ELEC_FUEL_CONSUMPTION_QUANTITY %LIG'),
 (77, u'ELEC_FUEL_CONSUMPTION_QUANTITY %MWH'),
 (78, u'ELEC_FUEL_CONSUMPTION_QUANTITY %NG'),
 (79, u'ELEC_FUEL_CONSUMPTION_QUANTITY %NUC'),
 (80, u'ELEC_FUEL_CONSUMPTION_QUANTITY %OBG'),
 (81, u'ELEC_FUEL_CONSUMPTION_QUANTITY %OBL'),
 (82, u'ELEC_FUEL_CONSUMPTION_QUANTITY %OBS'),
 (83, u'ELEC_FUEL_CONSUMPTION_QUANTITY %OG'),
 (84, u'ELEC_FUEL_CONSUMPTION_QUANTITY %OTH'),
 (85, u'ELEC_FUEL_CONSUMPTION_QUANTITY %PC'),
 (86, u'ELEC_FUEL_CONSUMPTION_QUANTITY %PUR'),
 (87, u'ELEC_FUEL_CONSUMPTION_QUANTITY %RFO'),
 (88, u'ELEC_FUEL_CONSUMPTION_QUANTITY %SC'),
 (89, u'ELEC_FUEL_CONSUMPTION_QUANTITY %SUB'),
 (90, u'ELEC_FUEL_CONSUMPTION_QUANTITY %SUN'),
 (91, u'ELEC_FUEL_CONSUMPTION_QUANTITY %WAT'),
 (92, u'ELEC_FUEL_CONSUMPTION_QUANTITY %WDS'),
 (93, u'ELEC_FUEL_CONSUMPTION_QUANTITY %WH'),
 (94, u'ELEC_FUEL_CONSUMPTION_QUANTITY %WND'),
 (95, u'ELEC_FUEL_CONSUMPTION_QUANTITY %WO'),
 (96, u'NET_GENERATION_(MEGAWATTHOURS) %AB'),
 (97, u'NET_GENERATION_(MEGAWATTHOURS) %BIT'),
 (98, u'NET_GENERATION_(MEGAWATTHOURS) %BLQ'),
 (99, u'NET_GENERATION_(MEGAWATTHOURS) %DFO'),
 (100, u'NET_GENERATION_(MEGAWATTHOURS) %JF'),
 (101, u'NET_GENERATION_(MEGAWATTHOURS) %LFG'),
 (102, u'NET_GENERATION_(MEGAWATTHOURS) %LIG'),
 (103, u'NET_GENERATION_(MEGAWATTHOURS) %MWH'),
 (104, u'NET_GENERATION_(MEGAWATTHOURS) %NG'),
 (105, u'NET_GENERATION_(MEGAWATTHOURS) %NUC'),
 (106, u'NET_GENERATION_(MEGAWATTHOURS) %OBG'),
 (107, u'NET_GENERATION_(MEGAWATTHOURS) %OBL'),
 (108, u'NET_GENERATION_(MEGAWATTHOURS) %OBS'),
 (109, u'NET_GENERATION_(MEGAWATTHOURS) %OG'),
 (110, u'NET_GENERATION_(MEGAWATTHOURS) %OTH'),
 (111, u'NET_GENERATION_(MEGAWATTHOURS) %PC'),
 (112, u'NET_GENERATION_(MEGAWATTHOURS) %PUR'),
 (113, u'NET_GENERATION_(MEGAWATTHOURS) %RFO'),
 (114, u'NET_GENERATION_(MEGAWATTHOURS) %SC'),
 (115, u'NET_GENERATION_(MEGAWATTHOURS) %SUB'),
 (116, u'NET_GENERATION_(MEGAWATTHOURS) %SUN'),
 (117, u'NET_GENERATION_(MEGAWATTHOURS) %WAT'),
 (118, u'NET_GENERATION_(MEGAWATTHOURS) %WDS'),
 (119, u'NET_GENERATION_(MEGAWATTHOURS) %WH'),
 (120, u'NET_GENERATION_(MEGAWATTHOURS) %WND'),
 (121, u'NET_GENERATION_(MEGAWATTHOURS) %WO'),
 (122, u'TOTAL_FUEL_CONSUMPTION_MMBTU %AB'),
 (123, u'TOTAL_FUEL_CONSUMPTION_MMBTU %BIT'),
 (124, u'TOTAL_FUEL_CONSUMPTION_MMBTU %BLQ'),
 (125, u'TOTAL_FUEL_CONSUMPTION_MMBTU %DFO'),
 (126, u'TOTAL_FUEL_CONSUMPTION_MMBTU %JF'),
 (127, u'TOTAL_FUEL_CONSUMPTION_MMBTU %LFG'),
 (128, u'TOTAL_FUEL_CONSUMPTION_MMBTU %LIG'),
 (129, u'TOTAL_FUEL_CONSUMPTION_MMBTU %MWH'),
 (130, u'TOTAL_FUEL_CONSUMPTION_MMBTU %NG'),
 (131, u'TOTAL_FUEL_CONSUMPTION_MMBTU %NUC'),
 (132, u'TOTAL_FUEL_CONSUMPTION_MMBTU %OBG'),
 (133, u'TOTAL_FUEL_CONSUMPTION_MMBTU %OBL'),
 (134, u'TOTAL_FUEL_CONSUMPTION_MMBTU %OBS'),
 (135, u'TOTAL_FUEL_CONSUMPTION_MMBTU %OG'),
 (136, u'TOTAL_FUEL_CONSUMPTION_MMBTU %OTH'),
 (137, u'TOTAL_FUEL_CONSUMPTION_MMBTU %PC'),
 (138, u'TOTAL_FUEL_CONSUMPTION_MMBTU %PUR'),
 (139, u'TOTAL_FUEL_CONSUMPTION_MMBTU %RFO'),
 (140, u'TOTAL_FUEL_CONSUMPTION_MMBTU %SC'),
 (141, u'TOTAL_FUEL_CONSUMPTION_MMBTU %SUB'),
 (142, u'TOTAL_FUEL_CONSUMPTION_MMBTU %SUN'),
 (143, u'TOTAL_FUEL_CONSUMPTION_MMBTU %WAT'),
 (144, u'TOTAL_FUEL_CONSUMPTION_MMBTU %WDS'),
 (145, u'TOTAL_FUEL_CONSUMPTION_MMBTU %WH'),
 (146, u'TOTAL_FUEL_CONSUMPTION_MMBTU %WND'),
 (147, u'TOTAL_FUEL_CONSUMPTION_MMBTU %WO'),
 (148, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %AB'),
 (149, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %BIT'),
 (150, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %BLQ'),
 (151, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %DFO'),
 (152, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %JF'),
 (153, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %LFG'),
 (154, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %LIG'),
 (155, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %MWH'),
 (156, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %NG'),
 (157, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %NUC'),
 (158, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %OBG'),
 (159, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %OBL'),
 (160, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %OBS'),
 (161, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %OG'),
 (162, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %OTH'),
 (163, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %PC'),
 (164, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %PUR'),
 (165, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %RFO'),
 (166, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %SC'),
 (167, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %SUB'),
 (168, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %SUN'),
 (169, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %WAT'),
 (170, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %WDS'),
 (171, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %WH'),
 (172, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %WND'),
 (173, u'TOTAL_FUEL_CONSUMPTION_QUANTITY %WO')]

Define a list of fuel types


In [45]:
fuels = [text[29:] for text in eia923[2007].columns[44:70]]

In [46]:
fuels


Out[46]:
[u'AB',
 u'BIT',
 u'BLQ',
 u'DFO',
 u'JF',
 u'LFG',
 u'LIG',
 u'MWH',
 u'NG',
 u'NUC',
 u'OBG',
 u'OBL',
 u'OBS',
 u'OG',
 u'OTH',
 u'PC',
 u'PUR',
 u'RFO',
 u'SC',
 u'SUB',
 u'SUN',
 u'WAT',
 u'WDS',
 u'WH',
 u'WND',
 u'WO']

In [51]:
eia923[2007].iloc[0,44:70].idxmax()[29:]


Out[51]:
u'WAT'

In [10]:
def top_fuel(row):
    #Fraction of largest fuel for electric heat input 
    try:
        fuel = row.iloc[44:70].idxmax()[29:]
    except:
        print row.iloc[44:70]
        return None
    return fuel

In [68]:
top_fuel(eia923[2007].loc[0,:])


Out[68]:
u'WAT'

In [76]:
eia923[2007].apply(top_fuel, axis=1).value_counts()


Out[76]:
NG     123
WND     28
WAT     21
LFG     10
SUB      8
AB       7
LIG      6
DFO      3
PC       3
NUC      2
PUR      1
BLQ      1
WH       1
OBL      1
OTH      1
dtype: int64

In [82]:
eia923[2011].apply(top_fuel, axis=1).value_counts()


ELEC_FUEL_CONSUMPTION_QUANTITY %PUR       0
ELEC_FUEL_CONSUMPTION_QUANTITY %RFO       0
ELEC_FUEL_CONSUMPTION_QUANTITY %SC        0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUB       0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUN       0
ELEC_FUEL_CONSUMPTION_QUANTITY %WAT       0
ELEC_FUEL_CONSUMPTION_QUANTITY %WDS       0
ELEC_FUEL_CONSUMPTION_QUANTITY %WH        0
ELEC_FUEL_CONSUMPTION_QUANTITY %WND       0
ELEC_FUEL_CONSUMPTION_QUANTITY %WO        0
MMBTU_PER_UNIT_APR                        .
MMBTU_PER_UNIT_AUG                        .
MMBTU_PER_UNIT_DEC                        .
MMBTU_PER_UNIT_FEB                        0
MMBTU_PER_UNIT_JAN                        0
MMBTU_PER_UNIT_JUL                        .
MMBTU_PER_UNIT_JUN                        .
MMBTU_PER_UNIT_MAR                        0
MMBTU_PER_UNIT_MAY                        .
MMBTU_PER_UNIT_NOV                        .
MMBTU_PER_UNIT_OCT                        .
MMBTU_PER_UNIT_SEP                        .
NETGEN_APR                                .
NETGEN_AUG                                .
NETGEN_DEC                                .
NETGEN_FEB                             -160
Name: 19, dtype: object
ELEC_FUEL_CONSUMPTION_QUANTITY %PUR    0
ELEC_FUEL_CONSUMPTION_QUANTITY %RFO    0
ELEC_FUEL_CONSUMPTION_QUANTITY %SC     0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUB    0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUN    0
ELEC_FUEL_CONSUMPTION_QUANTITY %WAT    0
ELEC_FUEL_CONSUMPTION_QUANTITY %WDS    0
ELEC_FUEL_CONSUMPTION_QUANTITY %WH     0
ELEC_FUEL_CONSUMPTION_QUANTITY %WND    0
ELEC_FUEL_CONSUMPTION_QUANTITY %WO     0
MMBTU_PER_UNIT_APR                     .
MMBTU_PER_UNIT_AUG                     .
MMBTU_PER_UNIT_DEC                     0
MMBTU_PER_UNIT_FEB                     .
MMBTU_PER_UNIT_JAN                     .
MMBTU_PER_UNIT_JUL                     .
MMBTU_PER_UNIT_JUN                     .
MMBTU_PER_UNIT_MAR                     .
MMBTU_PER_UNIT_MAY                     .
MMBTU_PER_UNIT_NOV                     .
MMBTU_PER_UNIT_OCT                     .
MMBTU_PER_UNIT_SEP                     .
NETGEN_APR                             .
NETGEN_AUG                             .
NETGEN_DEC                             0
NETGEN_FEB                             .
Name: 159, dtype: object
ELEC_FUEL_CONSUMPTION_QUANTITY %PUR       0
ELEC_FUEL_CONSUMPTION_QUANTITY %RFO       0
ELEC_FUEL_CONSUMPTION_QUANTITY %SC        0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUB       0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUN       0
ELEC_FUEL_CONSUMPTION_QUANTITY %WAT       0
ELEC_FUEL_CONSUMPTION_QUANTITY %WDS       1
ELEC_FUEL_CONSUMPTION_QUANTITY %WH        0
ELEC_FUEL_CONSUMPTION_QUANTITY %WND       0
ELEC_FUEL_CONSUMPTION_QUANTITY %WO        0
MMBTU_PER_UNIT_APR                        .
MMBTU_PER_UNIT_AUG                        .
MMBTU_PER_UNIT_DEC                       11
MMBTU_PER_UNIT_FEB                        .
MMBTU_PER_UNIT_JAN                        .
MMBTU_PER_UNIT_JUL                        .
MMBTU_PER_UNIT_JUN                        .
MMBTU_PER_UNIT_MAR                        .
MMBTU_PER_UNIT_MAY                        .
MMBTU_PER_UNIT_NOV                      8.3
MMBTU_PER_UNIT_OCT                     8.92
MMBTU_PER_UNIT_SEP                     8.74
NETGEN_APR                                .
NETGEN_AUG                                .
NETGEN_DEC                             5346
NETGEN_FEB                                .
Name: 170, dtype: object
ELEC_FUEL_CONSUMPTION_QUANTITY %PUR        0
ELEC_FUEL_CONSUMPTION_QUANTITY %RFO        0
ELEC_FUEL_CONSUMPTION_QUANTITY %SC         0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUB        0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUN        0
ELEC_FUEL_CONSUMPTION_QUANTITY %WAT        0
ELEC_FUEL_CONSUMPTION_QUANTITY %WDS        0
ELEC_FUEL_CONSUMPTION_QUANTITY %WH         0
ELEC_FUEL_CONSUMPTION_QUANTITY %WND        0
ELEC_FUEL_CONSUMPTION_QUANTITY %WO         0
MMBTU_PER_UNIT_APR                         .
MMBTU_PER_UNIT_AUG                         .
MMBTU_PER_UNIT_DEC                         0
MMBTU_PER_UNIT_FEB                         .
MMBTU_PER_UNIT_JAN                         .
MMBTU_PER_UNIT_JUL                         .
MMBTU_PER_UNIT_JUN                         .
MMBTU_PER_UNIT_MAR                         .
MMBTU_PER_UNIT_MAY                         .
MMBTU_PER_UNIT_NOV                         0
MMBTU_PER_UNIT_OCT                         .
MMBTU_PER_UNIT_SEP                         .
NETGEN_APR                                 .
NETGEN_AUG                                 .
NETGEN_DEC                             11999
NETGEN_FEB                                 .
Name: 202, dtype: object
ELEC_FUEL_CONSUMPTION_QUANTITY %PUR     0
ELEC_FUEL_CONSUMPTION_QUANTITY %RFO     0
ELEC_FUEL_CONSUMPTION_QUANTITY %SC      0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUB     0
ELEC_FUEL_CONSUMPTION_QUANTITY %SUN     0
ELEC_FUEL_CONSUMPTION_QUANTITY %WAT     0
ELEC_FUEL_CONSUMPTION_QUANTITY %WDS     0
ELEC_FUEL_CONSUMPTION_QUANTITY %WH      0
ELEC_FUEL_CONSUMPTION_QUANTITY %WND     0
ELEC_FUEL_CONSUMPTION_QUANTITY %WO      0
MMBTU_PER_UNIT_APR                     ..
MMBTU_PER_UNIT_AUG                     ..
MMBTU_PER_UNIT_DEC                     ..
MMBTU_PER_UNIT_FEB                      0
MMBTU_PER_UNIT_JAN                      0
MMBTU_PER_UNIT_JUL                     ..
MMBTU_PER_UNIT_JUN                     ..
MMBTU_PER_UNIT_MAR                     ..
MMBTU_PER_UNIT_MAY                     ..
MMBTU_PER_UNIT_NOV                     ..
MMBTU_PER_UNIT_OCT                     ..
MMBTU_PER_UNIT_SEP                     ..
NETGEN_APR                             ..
NETGEN_AUG                             ..
NETGEN_DEC                             ..
NETGEN_FEB                              0
Name: 248, dtype: object
Out[82]:
          239
Y %PUR     13
dtype: int64

In [11]:
eia923[2007]['fuel'] = eia923[2007].apply(top_fuel, axis=1)

In [84]:
eia923[2007]


Out[84]:
PLANT_ID PLANT_NAME YEAR TOTAL_FUEL_CONSUMPTION_QUANTITY ELEC_FUEL_CONSUMPTION_QUANTITY TOTAL_FUEL_CONSUMPTION_MMBTU ELEC_FUEL_CONSUMPTION_MMBTU NET_GENERATION_(MEGAWATTHOURS) MMBTU_PER_UNIT_JAN MMBTU_PER_UNIT_FEB ... TOTAL_FUEL_CONSUMPTION_QUANTITY %RFO TOTAL_FUEL_CONSUMPTION_QUANTITY %SC TOTAL_FUEL_CONSUMPTION_QUANTITY %SUB TOTAL_FUEL_CONSUMPTION_QUANTITY %SUN TOTAL_FUEL_CONSUMPTION_QUANTITY %WAT TOTAL_FUEL_CONSUMPTION_QUANTITY %WDS TOTAL_FUEL_CONSUMPTION_QUANTITY %WH TOTAL_FUEL_CONSUMPTION_QUANTITY %WND TOTAL_FUEL_CONSUMPTION_QUANTITY %WO fuel
0 3584 H 5 2007 0.0 0.0 109110.0 109110.0 1.103900e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
1 3585 Nolte 2007 0.0 0.0 118993.0 118993.0 1.203900e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
2 3586 TP 4 2007 0.0 0.0 142735.0 142735.0 1.444100e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
3 6147 Monticello 2007 11807274.0 11807274.0 171279117.0 171279117.0 1.539900e+07 34.452 34.281 ... 0.0 0.0 0.529492 0.0 0.0 0.0 0.0 0.0 0.0 SUB
4 6145 Comanche Peak 2007 0.0 0.0 197597659.0 197597659.0 1.884575e+07 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NUC
5 3594 Austin 2007 0.0 0.0 414654.0 414654.0 4.195200e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
6 3595 Buchanan 2007 0.0 0.0 381949.0 381949.0 3.864300e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
7 3597 Granite Shoals 2007 0.0 0.0 746904.0 746904.0 7.556700e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
8 3598 Inks 2007 0.0 0.0 149072.0 149072.0 1.508200e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
9 3599 Marble Falls 2007 0.0 0.0 373120.0 373120.0 3.775000e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
10 3600 Marshall Ford 2007 0.0 0.0 1950628.0 1950628.0 1.973520e+05 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
11 3601 Sim Gideon 2007 11630630.0 11630630.0 11685020.0 11685020.0 1.001793e+06 1.025 1.005 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
12 10243 Celanese Engineering Resin 2007 1372291.0 154240.0 1399737.0 157326.0 1.070000e+04 1.020 1.020 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
13 10261 Enterprise Products Operating 2007 2859283.0 863771.0 2950779.0 891411.0 1.884800e+05 1.032 1.032 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
14 55320 Wise County Power LP 2007 19108619.0 19108619.0 19445694.0 19445694.0 2.554023e+06 2.034 2.028 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
15 3609 Leon Creek 2007 1213072.0 1213072.0 1237933.0 1237933.0 1.114660e+05 1.019 1.032 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
16 3611 O W Sommers 2007 15285685.0 15285685.0 15605865.0 15605865.0 1.402671e+06 6.897 6.901 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
17 3612 V H Braunig 2007 9560220.0 9560220.0 9779034.0 9779034.0 8.889870e+05 1.021 1.021 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
18 3613 W B Tuttle 2007 229910.0 229910.0 234195.0 234195.0 1.309700e+04 1.020 1.021 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
19 56350 Colorado Bend Energy Center 2007 2536664.0 2536664.0 2603911.0 2603911.0 2.990230e+05 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
20 55327 Baytown Energy Center 2007 37553180.0 37307395.0 38430302.0 38179364.0 4.618128e+06 1.027 1.023 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
21 7200 Robert D Willis 2007 0.0 0.0 40744.0 40744.0 4.122000e+03 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
22 54817 Johnson County 2007 10148810.0 10148810.0 10018948.0 10018948.0 1.291123e+06 0.996 0.995 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
23 6178 Coleto Creek 2007 2510370.0 2510370.0 44712930.0 44712930.0 4.217795e+06 18.651 19.014 ... 0.0 0.0 1.000000 0.0 0.0 0.0 0.0 0.0 0.0 SUB
24 6179 Fayette Power Project 2007 7354561.0 7354561.0 124326800.0 124326800.0 1.214436e+07 22.951 22.740 ... 0.0 0.0 0.997591 0.0 0.0 0.0 0.0 0.0 0.0 SUB
25 6181 J T Deely 2007 3418439.0 3418439.0 56432194.0 56432194.0 5.320426e+06 23.963 23.969 ... 0.0 0.0 0.974821 0.0 0.0 0.0 0.0 0.0 0.0 SUB
26 10790 Victoria Texas Plant 2007 12317700.0 2161649.0 12785773.0 2243791.0 5.025590e+05 1.038 1.038 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
27 6183 San Miguel 2007 3193875.0 3193875.0 33657590.0 33657590.0 2.713947e+06 16.200 16.430 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 LIG
28 3624 Weatherford 2007 0.0 0.0 0.0 0.0 0.000000e+00 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 AB
29 3627 North Texas 2007 1344.0 1344.0 2057.0 2057.0 1.010000e+02 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
186 10692 ExxonMobil Baytown Turbine 2007 30962452.0 13370932.0 31512948.0 13609173.0 2.457731e+06 1.021 1.019 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
187 50118 Hal C Weaver Power Plant 2007 4437176.0 2349982.0 4533109.0 2401487.0 3.372360e+05 2.040 2.042 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
188 10184 Central Utility Plant 2007 292229.0 265263.0 303561.0 275554.0 1.308000e+04 6.820 6.820 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
189 50121 Valero Refinery Corpus Christi West 2007 2255638.0 2255519.0 4111325.0 4107255.0 2.709148e+05 70.460 70.460 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 PC
190 56270 Callahan Divide Wind Energy Center 2007 0.0 0.0 3669810.0 3669810.0 3.712880e+05 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WND
191 50127 Signal Hill Wichita Falls Power LP 2007 355243.0 355243.0 361505.0 361505.0 3.937400e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
192 52176 C R Wing Cogen Plant 2007 5864293.0 4749841.0 5977688.0 4841953.0 6.282564e+05 2.000 2.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
193 6136 Gibbons Creek 2007 2076191.0 2076191.0 34629582.0 34629582.0 3.442748e+06 17.778 17.856 ... 0.0 0.0 0.984584 0.0 0.0 0.0 0.0 0.0 0.0 SUB
194 3492 Morgan Creek 2007 1981742.0 1981742.0 2066391.0 2066391.0 1.495690e+05 6.835 1.019 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
195 10203 Valero Refinery Corpus Christi East 2007 3825259.0 1090684.0 3825259.0 1090684.0 2.350285e+05 1.000 1.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
196 3548 Decker Creek 2007 12392426.0 12392426.0 12888644.0 12888644.0 1.152770e+06 7.918 2.074 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
197 3549 Holly Street 2007 4292025.0 4292025.0 4396301.0 4396301.0 3.785590e+05 1.022 1.037 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
198 10569 Big Spring Texas Refinery 2007 0.0 0.0 0.0 0.0 0.000000e+00 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 AB
199 56291 Horse Hollow Wind Energy Center 2007 0.0 0.0 18295077.0 18295077.0 1.850979e+06 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WND
200 3494 Permian Basin 2007 5409329.0 5409329.0 5628121.0 5628121.0 3.961010e+05 7.876 2.070 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
201 50150 Union Carbide Seadrift Cogen 2007 7544790.0 6808386.0 7900924.0 7040075.0 6.393765e+05 3.700 3.700 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
202 3559 Silas Ray 2007 516629.0 516629.0 538598.0 538598.0 5.169900e+04 7.584 7.584 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
203 50153 Texas City Plant Union Carbide 2007 1226429.0 922958.0 1954749.0 1645207.0 1.860630e+05 1.020 1.020 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
204 54253 S&L Cogeneration 2007 1525748.0 624870.0 1542532.0 631743.0 1.191870e+05 1.011 1.011 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
205 6128 Amistad Dam & Power 2007 0.0 0.0 1466845.0 1466845.0 1.484060e+05 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
206 55795 Woodward Mountain II 2007 0.0 0.0 1630158.0 1630158.0 1.649290e+05 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WND
207 55796 Woodward Mountain I 2007 0.0 0.0 1742953.0 1742953.0 1.763410e+05 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WND
208 10741 Clear Lake Cogeneration Ltd 2007 5420230.0 2952281.0 5517795.0 3005422.0 5.383190e+05 1.018 1.018 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
209 3574 C E Newman 2007 3634.0 3634.0 3707.0 3707.0 4.840000e+02 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
210 56311 Sweetwater Wind 3 LLC 2007 0.0 0.0 4012963.0 4012963.0 4.060060e+05 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WND
211 3576 Ray Olinger 2007 3404835.0 3404835.0 3529341.0 3529341.0 2.799860e+05 6.580 1.020 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 NG
212 56602 Snyder Wind Farm 2007 0.0 0.0 63643.0 63643.0 6.439000e+03 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WND
213 3581 Abbott TP 3 2007 0.0 0.0 155926.0 155926.0 1.577565e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
214 3582 Dunlap TP 1 2007 0.0 0.0 229082.0 229082.0 2.317700e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT
215 3583 H 4 2007 0.0 0.0 115247.0 115247.0 1.166000e+04 0.000 0.000 ... 0.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 WAT

216 rows × 175 columns


In [88]:
eia923[2007].groupby('fuel').sum()['NET_GENERATION_(MEGAWATTHOURS)'].plot(kind='bar')
plt.ylabel('Net Generation in 2007 (MWh)')


Out[88]:
<matplotlib.axes._subplots.AxesSubplot at 0x11a7c6590>
Out[88]:
<matplotlib.text.Text at 0x11bf8b1d0>

In [90]:
eia923[2009]['fuel'] = eia923[2009].apply(top_fuel, axis=1)
eia923[2009].groupby('fuel').sum()['NET_GENERATION_(MEGAWATTHOURS)'].plot(kind='bar')
plt.ylabel('Net Generation in 2009 (MWh)')


Out[90]:
<matplotlib.axes._subplots.AxesSubplot at 0x11c106a90>
Out[90]:
<matplotlib.text.Text at 0x11aa0be90>

EPA hourly data

Load the zipped pickle file. Takes a couple min, but at least the data file is less than 60 MB.


In [9]:
# load a zipped pickle file 
# from http://stackoverflow.com/questions/18474791/decreasing-the-size-of-cpickle-objects
def load_zipped_pickle(filename):
    with gzip.open(filename, 'rb') as f:
        loaded_object = pickle.load(f)
        return loaded_object

In [10]:
filename = 'EPA hourly dictionary.pgz'
path = '../Clean Data'
fullpath = os.path.join(path, filename)

epaDict = load_zipped_pickle(fullpath)

In [11]:
df = epaDict['2015 July-Dec']
set(df['PLANT_ID'])


Out[11]:
{127,
 298,
 3439,
 3441,
 3443,
 3452,
 3453,
 3460,
 3464,
 3468,
 3469,
 3470,
 3476,
 3477,
 3478,
 3490,
 3491,
 3492,
 3494,
 3497,
 3504,
 3507,
 3548,
 3559,
 3576,
 3601,
 3611,
 3612,
 3628,
 3631,
 4266,
 4937,
 4939,
 6136,
 6139,
 6147,
 6178,
 6179,
 6180,
 6181,
 6183,
 6243,
 6648,
 7030,
 7097,
 7325,
 7900,
 8063,
 52176,
 55015,
 55091,
 55139,
 55153,
 55215,
 55223,
 56350,
 56674}

In [12]:
df_temp = df[df['PLANT_ID'].isin([127, 298, 3439])].fillna(0)

In [13]:
df_temp.head()


Out[13]:
STATE FACILITY NAME PLANT_ID YEAR DATE HOUR GROSS LOAD (MW) STEAM LOAD (1000LB/HR) SO2 (POUNDS) NOX (POUNDS) CO2 (SHORT TONS) EPA REGION NERC REGION COUNTY HEAT INPUT (MMBTU) FACILITY LATITUDE FACILITY LONGITUDE DATETIME
79488 TX Laredo 3439 2015 2015-07-01 0 0.0 0.0 0.0 0.0 0.0 6 ERCOT Webb 0.0 27.5667 -99.5083 2015-07-01 00:00:00
79489 TX Laredo 3439 2015 2015-07-01 1 0.0 0.0 0.0 0.0 0.0 6 ERCOT Webb 0.0 27.5667 -99.5083 2015-07-01 01:00:00
79490 TX Laredo 3439 2015 2015-07-01 2 0.0 0.0 0.0 0.0 0.0 6 ERCOT Webb 0.0 27.5667 -99.5083 2015-07-01 02:00:00
79491 TX Laredo 3439 2015 2015-07-01 3 0.0 0.0 0.0 0.0 0.0 6 ERCOT Webb 0.0 27.5667 -99.5083 2015-07-01 03:00:00
79492 TX Laredo 3439 2015 2015-07-01 4 0.0 0.0 0.0 0.0 0.0 6 ERCOT Webb 0.0 27.5667 -99.5083 2015-07-01 04:00:00

In [17]:
g = sns.FacetGrid(df_temp, col='PLANT_ID', sharey=False)
g.map(plt.plot, 'DATETIME', 'GROSS LOAD (MW)')
g.set_xticklabels(rotation=30)

path = os.path.join('Midterm figures', 'Sample hourly load.svg')
plt.savefig(path)


Out[17]:
<seaborn.axisgrid.FacetGrid at 0x11cb76f50>
Out[17]:
<seaborn.axisgrid.FacetGrid at 0x11cb76f50>

In [ ]: